Reddit and Stackoverflow are two community based websites which are used heavily by people all across the world. Reddit is known as the “Front Page of the Internet” and is a popular forum especially among young people where users can post anything and everything.It has a large international community and a lot of programming related content. Reddit is a place for all possible topics of discussion whereas Stack Overflow is centered around programming languages, ideas and discussions surrounding it. Stack overflow is more of a question answer based system. Reddit is mostly a post and comments based system. Both of these websites have an upvote/downvote system and are great resources for coders and programmers around the globe.
We want to use data from the Reddit forum in order to better understand the popularity of Programming languages among Reddit users.Additionally, we want to compare it to data from the Stack Overflow forum. We want to evaluate what programming languages are being discussed in both forums and compare how their usage or popularity has changed over time. For reaching our aim we want to use Visualization and Machine Learning methods based on text data but also use the quantification that we get from the upvotes and number of comments on both the platforms.
Since both platforms are very popular and most sought after and contain a lot of discussion related to programming, it would be good to study and analyse the trend of how users have been using some of the topmost programming languages. The change of trend would help us in understanding how the popularity of programming language has changed over time and also we could then predict which programming language would be centre of discussion or most queried of in these two platforms in near future.
The objective of our project is to find a correlation between the different parameters of questions or posts in these platforms and try to calculate how the popularity of programming language has changed over time. And then further we would like to predict what would be the trends of these programming languages in near future. We could use this information to also relate the kinds of problems or topics which are most related with this programming languages and analyze the kind of topics or solutions most used for certain kinds of problems. We would try to answer the following research questions :
For the Reddit posts the plan is to use an API from Reddit to get data sets for a certain time range and a number of specific Subreddits. The choice of the Subreddits is crucial for the quality and expressiveness of our data and will be based on some prior research on interesting Subreddits regarding programming. From this data we can then get the Subreddit, title, text, upvotes and various metadata.
For Stackoverflow we plan to use the datadumps available on internet archive and then merge them and further preprocess to use it for analysis, visualisation and prediction tasks.
| data.subreddit | data.title | data.id | data.created | data.created_utc | data.upvote_ratio | data.ups | data.score | data.num_comments |
|---|---|---|---|---|---|---|---|---|
| coding | I built this Lottie animation editor to edit Lottie animations without After Effects! If like me, you use Lottie animations as part of your frontend UI but struggle with After Effects or implementation issues, would like to know what you think | nh9900 | 1621567880 | 1621539080 | 0.88 | 13 | 13 | 1 |
| coding | Back-End VS Front-End Framework | 6 J.S. Frameworks Experts Love - Untied Blogs | nh0yzf | 1621547972 | 1621519172 | 0.33 | 0 | 0 | 1 |
| coding | File Descriptor Limits | ngzeep | 1621543958 | 1621515158 | 0.40 | 0 | 0 | 0 |